Data visualization is a key component to any scientific journal or popular science article. Being able to tell a compelling story using just the data at hand should be the goal of any figure.
In this primer, we are going to be using the package “ggplot2” in R for graphing purposes. This will get use through the basics with a few various types of figures and how to cleanly modify them for to look a bit better. Other primers will be focusing more on the components of good figure making practices.
For this primer, you will need the city_df.csv file
ggplot2 is one of the most widely used R packages and graphical programs. It breaks everything into scales and layers, allowing quick manipulaion of complex figure types. While it is very easy to get simple figures made, customization requires a more in-depth understanding of what is happening under the hood. Once you understand how to format your data in a manner ggplot works well with, it becomes largely intuitive.
Below is a all open-source book and course all on data visualization in R
- https://clauswilke.com/dataviz/
- https://wilkelab.org/SDS375/
R basics for those needing a refresher:
- https://rpubs.com/chalsch/intro_to_R
Both these programs are free and open-source. R is the actual program while RStudio is a user-friendly GUI (graphical user interface) to run R.
It is possible to set up an R Jupyter Notebooks but it the interface of RStudio is much more user friendly
Within R, you will need a few packages:
Formating your data for figures or analysis is critical. The two formats are known as long and wide formats. Most statistical packages and ggplot require long data formats. Wide data is commonly seen in species composition data with a row per each observation at a site/whatever and a column for each species. The data estimate would be the count of that species at a site. The long verision of this would be a single count column and species column. This way we can just put in one variable to get summaries for all the species.
Example of wide and long data:
notice the difference in the diminsions
WIDE
## [1] "Wide dimensions: rows = 480 ,cols = 149"
## site year precip block nitrogen plot Agropyron.smithii Alliium.textile
## 1 mesic 2013 729.1 A 0.0 6 0 0
## 2 mesic 2013 729.1 A 2.5 8 0 0
## 3 mesic 2013 729.1 A 5.0 5 0 0
## 4 mesic 2013 729.1 A 7.5 7 0 0
## 5 mesic 2013 729.1 A 10.0 3 0 0
## Ambrosia.artemisiifolia Ambrosia.psilostachya
## 1 0 8
## 2 0 7
## 3 0 4
## 4 0 0
## 5 0 1
WIDE
## [1] "Long dimensions: rows = 5007 ,cols = 8"
## site year precip block nitrogen plot species cover
## 1 mesic 2013 729.1 A 30 1 Ambrosia psilostachya 12
## 2 mesic 2013 729.1 A 30 1 Dalea candida 4
## 3 mesic 2013 729.1 A 30 1 Panicum virgatum 25
## 4 mesic 2013 729.1 A 30 1 Dichanthelium oligosanthes 7
## 5 mesic 2013 729.1 A 30 1 Andropogon gerardii 55
## 6 mesic 2013 729.1 A 30 1 Kuhnia eupatorioides 2
## 7 mesic 2013 729.1 A 30 1 Physalis pumila 2
## 8 mesic 2013 729.1 A 30 1 Sporobolus asper 4
## 9 mesic 2013 729.1 A 30 1 Solidago canadensis 6
## 10 mesic 2013 729.1 A 30 1 Asclepias verticillata 3
For this and the rest of the primer, you will need the city_df.csv
city_df.csv can be read in as is and contains various information on the 20 largest cities in WA, OR, CA, NV, and AZ.
Variable descriptions:
- State: two-letter abbreviations for the states
- City: name of city
- Lat: latitude of city
- Long: longitude of city
- Pop: city population
- Time: arbitary time series column 1-20 (simulated for graphical purposes)
- Growth: arbitary growth column (simulated for graphical purposes)
- Ppt: Annual preciptication using PRISM 30-year normals, 1981-2010
- Temp: Annual mean temperature using PRISM 30-year normals, 1981-2010
- Size: categorical city size variable based on population (small < 1,000,000 & large > 1,000,000)
Working directory should be the same location as this Markdown and all other files.
library(tidyverse)
library(ggforce)
library(ggsci)
library(patchwork)
library(Hmisc)
setwd("~/g/projects/DataVis/JulieClass2022/")
Read in the city_df.csv data frame and look at the structure
city_df <- read.csv("city_df.csv")
str(city_df)
## 'data.frame': 100 obs. of 10 variables:
## $ State : Factor w/ 5 levels "AZ","CA","NV",..: 4 4 4 4 4 4 4 4 4 4 ...
## $ City : Factor w/ 100 levels "Albany","Aloha",..: 66 22 73 50 10 33 35 7 19 84 ...
## $ Lat : num 45.5 44.1 44.9 42.3 44.1 ...
## $ Long : num -123 -123 -123 -123 -121 ...
## $ Pop : int 2074775 273439 266804 170876 109802 109381 109128 99037 67467 63230 ...
## $ Time : int 20 19 18 17 16 15 14 13 12 11 ...
## $ Growth: num 11820 10865 9689 8430 7556 ...
## $ Ppt : num 1104 1137 1009 509 293 ...
## $ Temp : num 11.92 11.35 11.67 12.12 8.12 ...
## $ Size : Factor w/ 2 levels "large","small": 1 2 2 2 2 2 2 2 2 2 ...
Factors are a special case variable in R that can be very helpful and also, create a huge headache. If you are getting an error in R, factor is probably the first thing you should be checking.
Let’s look under the hood of the State factor…
From the above structure view, you can see that State contains 5 total levels. If we look state as.character() variable, you will see just the state id.
as.character(city_df$State)[1:5]
## [1] "OR" "OR" "OR" "OR" "OR"
But if we look more closely as the factors and treat them as.numeric(), will see the what the factor designation is doing
as.numeric(city_df$State)[1:5]
## [1] 4 4 4 4 4
factors code discrete variables with numbers ordered alphabetically
This is very important and will come in use later with more complex figure making
aesthetics or aes()
ggplot maps variable to what is known as aesthetics or aes(). Within the aesthetics you will be assigning elements of your plot such as position (x,y), color, fill, shape, and size. These will be automatically adjusted based on your data.
For example, within city_df we want to plot Time vs. Growth:
ggplot(data=city_df,aes(x=Time,y=Growth))
This will set up the plot with the basic form but we need to tell ggplot how we want to plot it.
layers
ggplot also work with layers, meaning it will build on itself in a layered/sequential manner. Let’s make some basic scatterplots with city_df to see.
Functions
- geom_point(): adds points
- geom_line(): adds lines
- stat_smooth(): adds various trendlines
ggplot(data = city_df, aes(x = Time, y = Growth)) + geom_point()
scatter plot but lets add colors for each state
ggplot is really smart, if you tell it the right things. It will automatically group things based off the structure or factor in your dataset and pick colors already for you.
We will do this in the aes() in the top layer so it stays throughout the rest of the figure. Notice how it sorts alphabetically because of the factor
ggplot(data = city_df, aes(x = Time, y = Growth, color = State)) +
geom_point()
ggplot already knows we want to group based on state because we said so in the top layer, so we just add a line
ggplot(data = city_df, aes(x = Time, y = Growth, color = State)) +
geom_point() + geom_line()
stat_smooth common options:
- method = "": chooses the type of model to run to get trend line (loess,lm,etc.)
- se = TRUE: standard error background based off 95% bootstrapped confidence inteval
Google for the rest of the options
ggplot(data = city_df, aes(x = Time, y = Growth, colour = State)) +
geom_point() + stat_smooth()
hint: remember the layering
We can do this by confining the colour aesthetic within geom_point()
ggplot(data = city_df, aes(x = Time, y = Growth)) + geom_point(aes(colour = State)) +
stat_smooth()
Remember, ordering matters. Sometimes it might be nice for lines to be behind points, that is up to you.
ggplot(data = city_df, aes(x = Time, y = Growth)) + geom_point(aes(colour = State),
size = 4) + stat_smooth(linetype = "dashed", colour = "black",
size = 2)
It automatically picks shapes for you based off factors and puts it in the legend.
ggplot(data = city_df, aes(x = Time, y = Growth)) + geom_point(aes(colour = State,
shape = Size), size = 4) + stat_smooth(linetype = "dashed",
colour = "black", size = 2)
If you find a format you like, just use it as a template and change variables around.
Lets make figures for:
- Lat vs. Ppt
- Lat vs. Temp
- Temp vs. Ppt
Let’s also pick a linear trend line over a curved loess
# lat vs Ppt
ggplot(data = city_df, aes(x = Lat, y = Ppt)) + geom_point(aes(colour = State,
shape = Size), size = 4) + stat_smooth(method = "lm", linetype = "dashed",
colour = "black", size = 2)
# lat vs Temp
ggplot(data = city_df, aes(x = Lat, y = Temp)) + geom_point(aes(colour = State,
shape = Size), size = 4) + stat_smooth(method = "lm", linetype = "dashed",
colour = "black", size = 2)
# Temp vs Ppt
ggplot(data = city_df, aes(x = Temp, y = Ppt)) + geom_point(aes(colour = State,
shape = Size), size = 4) + stat_smooth(method = "lm", linetype = "dashed",
colour = "black", size = 2)
Let’s try some various plots using categorical predictors and a continuous response.
Basic functions
- geom_bar(): adds bars
- geom_point(): adds points
- geom_error(): adds errorbars
- geom_boxblot(): adds boxplot
- geom_violin(): adds violin plot
- geom_density(): adds density plot
Fancy functions
- stat_summary(): summarizes and plots data
- geom_sina(): in ggforce libray, plots raw in violin/density shape
more aesthestic functions
- facet_wrap(): creates panels for different variables
If we are going to be using the base functions, we must first summarize the data in a meaningful way (mean, sd, N, 95% CI). Let’s plot Ppt and Temp by state
city_sum_df <- city_df %>%
group_by(State) %>%
summarise(Temp_mean = mean(Temp, na.rm = TRUE), #temp mean
Temp_sd = sd(Temp, na.rm = TRUE), #temp standard deviation
Temp_n = n(), #temp count
Ppt_mean = mean(Ppt, na.rm = TRUE), #Ppt mean
Ppt_sd = sd(Ppt, na.rm = TRUE), #Ppt standard deviation
Ppt_n = n()) %>% #Ppt count
mutate(Temp_se = Temp_sd / sqrt(Temp_n), #Temp standard error
Temp_lower.ci = Temp_mean - qt(1 - (0.05 / 2), Temp_n - 1) * Temp_se, #Temp lower 95% confidence interval
Temp_upper.ci = Temp_mean + qt(1 - (0.05 / 2), Temp_n - 1) * Temp_se, #Temp upper 95% confidence interval
Ppt_se = Ppt_sd / sqrt(Ppt_n), #Ppt standard error
Ppt_lower.ci = Ppt_mean - qt(1 - (0.05 / 2), Ppt_n - 1) * Ppt_se, #Ppt lower 95% confidence interval
Ppt_upper.ci = Ppt_mean + qt(1 - (0.05 / 2), Ppt_n - 1) * Ppt_se) #Ppt upper 95% confidence interval
city_sum_df
## # A tibble: 5 × 13
## State Temp_mean Temp_sd Temp_n Ppt_mean Ppt_sd Ppt_n Temp_se Temp_lower.ci
## <fct> <dbl> <dbl> <int> <dbl> <dbl> <int> <dbl> <dbl>
## 1 AZ 20.9 3.56 20 249. 88.0 20 0.797 19.2
## 2 CA 17.2 1.93 20 355. 147. 20 0.431 16.3
## 3 NV 16.4 4.53 20 156. 54.2 20 1.01 14.3
## 4 OR 11.4 1.20 20 945. 269. 20 0.269 10.9
## 5 WA 10.9 0.776 20 812. 378. 20 0.174 10.5
## # … with 4 more variables: Temp_upper.ci <dbl>, Ppt_se <dbl>,
## # Ppt_lower.ci <dbl>, Ppt_upper.ci <dbl>
#### TEMP ####
Temp_sum_df <- city_df %>%
group_by(State) %>%
summarise(mean = mean(Temp, na.rm = TRUE), #temp mean
sd = sd(Temp, na.rm = TRUE), #temp standard deviation
n = n()) %>% #temp count
mutate(se = sd / sqrt(n), #Temp standard error
ci = 1.96 * se) #Temp 95% confidence interval
Temp_sum_df$Clim <- 'Temp'
#### PPT ####
Ppt_sum_df <- city_df %>%
group_by(State) %>%
summarise(mean = mean(Ppt, na.rm = TRUE), #Ppt mean
sd = sd(Ppt, na.rm = TRUE), #Ppt standard deviation
n = n()) %>% #Ppt count
mutate(se = sd / sqrt(n), #Ppt standard error
ci = 1.96*se) #Ppt 95% confidence interval
Ppt_sum_df$Clim <- 'Ppt'
#### combine data.frames ####
city_sum_df <- rbind(Temp_sum_df,Ppt_sum_df)
city_sum_df
## # A tibble: 10 × 7
## State mean sd n se ci Clim
## <fct> <dbl> <dbl> <int> <dbl> <dbl> <chr>
## 1 AZ 20.9 3.56 20 0.797 1.56 Temp
## 2 CA 17.2 1.93 20 0.431 0.845 Temp
## 3 NV 16.4 4.53 20 1.01 1.99 Temp
## 4 OR 11.4 1.20 20 0.269 0.527 Temp
## 5 WA 10.9 0.776 20 0.174 0.340 Temp
## 6 AZ 249. 88.0 20 19.7 38.6 Ppt
## 7 CA 355. 147. 20 32.8 64.4 Ppt
## 8 NV 156. 54.2 20 12.1 23.8 Ppt
## 9 OR 945. 269. 20 60.1 118. Ppt
## 10 WA 812. 378. 20 84.5 166. Ppt
We much identify the stat with in geom_bar to be ‘identity’ as the default is to be counts (like a histogram)
ggplot(data = city_sum_df, aes(x = State, y = mean)) + geom_bar(stat = "identity")
ggplot(data = city_sum_df, aes(x = State, y = mean, fill = Clim)) +
geom_bar(stat = "identity")
We will do this with position()
ggplot(data = city_sum_df, aes(x = State, y = mean, fill = Clim)) +
geom_bar(stat = "identity", position = "dodge")
#### Let’s use facet_wrap() instead to get a better look at the comparisons among states
options:
- nrow: number of rows
- ncol: number of columns
- scales=‘free’: different y scales for each panel (be careful… can be misleading)
ggplot(data = city_sum_df, aes(x = State, y = mean, fill = Clim)) +
facet_wrap(~Clim, nrow = 1, scales = "free") + geom_bar(stat = "identity")
What would we want errorbars to be to effectively show differences? (sd, se, or 95% CI)
ggplot(data = city_sum_df, aes(x = State, y = mean, fill = Clim)) +
facet_wrap(~Clim, nrow = 1, scales = "free") + geom_bar(stat = "identity") +
geom_errorbar(aes(ymin = mean - ci, ymax = mean + ci))
ggplot(data = city_sum_df, aes(x = State, y = mean, colour = Clim)) +
facet_wrap(~Clim, nrow = 1, scales = "free") + geom_point(size = 5) +
geom_errorbar(aes(ymin = mean - ci, ymax = mean + ci), colour = "black")
Let’s use some different built in functions within ggplot to summarize for you
first lets make Ppt and Temp long formatted
city_long_df <- city_df %>%
pivot_longer(cols = c(Ppt, Temp), names_to = "Clim", values_to = "value")
city_long_df[1:10, ]
## # A tibble: 10 × 10
## State City Lat Long Pop Time Growth Size Clim value
## <fct> <fct> <dbl> <dbl> <int> <int> <dbl> <fct> <chr> <dbl>
## 1 OR Portland 45.5 -123. 2074775 20 11820. large Ppt 1104.
## 2 OR Portland 45.5 -123. 2074775 20 11820. large Temp 11.9
## 3 OR Eugene 44.1 -123. 273439 19 10865. small Ppt 1137.
## 4 OR Eugene 44.1 -123. 273439 19 10865. small Temp 11.4
## 5 OR Salem 44.9 -123. 266804 18 9689. small Ppt 1009.
## 6 OR Salem 44.9 -123. 266804 18 9689. small Temp 11.7
## 7 OR Medford 42.3 -123. 170876 17 8430. small Ppt 509.
## 8 OR Medford 42.3 -123. 170876 17 8430. small Temp 12.1
## 9 OR Bend 44.1 -121. 109802 16 7556. small Ppt 293.
## 10 OR Bend 44.1 -121. 109802 16 7556. small Temp 8.12
There’s many different fun or fun.data in stat_summary(). Here we are using mean and mean_cl_boot. Mean_cl_boot is part of library(Hmisc) and is a bootstrapped 95% confidence interval.
ggplot(data = city_long_df, aes(x = State, y = value, fill = Clim)) +
facet_wrap(~Clim, nrow = 1, scales = "free") + stat_summary(fun = mean,
geom = "bar") + stat_summary(fun.data = mean_cl_boot, geom = "errorbar",
color = "black")
ggplot(data = city_long_df, aes(x = State, y = value, colour = Clim)) +
facet_wrap(~Clim, nrow = 1, scales = "free") + stat_summary(fun.data = mean_cl_boot,
geom = "errorbar", color = "black") + stat_summary(fun = mean,
geom = "point", size = 5)
If you can, it is always best to show the full data. Bar charts are especially misleading.
Below is an example of why it could be important to show all the data. We will revisit code to make these figures later.
Make a boxplot
ggplot(data = city_long_df, aes(x = State, y = value, fill = Clim)) +
facet_wrap(~Clim, nrow = 1, scales = "free") + geom_boxplot()
make violin plot
ggplot(data = city_long_df, aes(x = State, y = value, fill = Clim)) +
facet_wrap(~Clim, nrow = 1, scales = "free") + geom_violin(draw_quantiles = 0.5)
Just layer points then summary stats
ggplot(data = city_long_df, aes(x = State, y = value, colour = Clim)) +
facet_wrap(~Clim, nrow = 1, scales = "free") + geom_point(size = 4) +
stat_summary(fun.data = mean_cl_boot, geom = "errorbar",
color = "black") + stat_summary(fun = mean, geom = "point",
size = 5, colour = "black")
Use geom_sina() in library(ggforce) for a violin plot style shape for raw ponts. This looks better with the more data you have.
ggplot(data = city_long_df, aes(x = State, y = value, colour = Clim)) +
facet_wrap(~Clim, nrow = 1, scales = "free") + geom_sina(size = 3) +
stat_summary(fun.data = mean_cl_boot, geom = "errorbar",
color = "black") + stat_summary(fun = mean, geom = "point",
size = 5, colour = "black")
Honestly, base ggplot is pretty hidieous but it is fast and has a bunch of customizable features. We already learned about about size, shape, colour, and fill. Let’s get more in depth with that.
Things covered:
- change factor label and order
- add x, y axis labels and title
- colour vs fill, different kinds of points
- limits
- scale
- theme
Start with a scatter plot of Temp vs Ppt and change State names and order
Components of factor:
- levels: order but must use exact character as listed
- labels: new names in order of levels
levels(city_df$State)
## [1] "AZ" "CA" "NV" "OR" "WA"
If you don’t want to change the variable in the data frame, you can create a new one or just change the factor in the ggplot code
city_df$State_name <- factor(city_df$State, levels = c("WA",
"OR", "CA", "NV", "AZ"), labels = c("Washington", "Oregon",
"California", "Nevada", "Arizona"))
levels(city_df$State_name)
## [1] "Washington" "Oregon" "California" "Nevada" "Arizona"
ggplot(data = city_df, aes(x = Temp, y = Ppt)) + geom_point(aes(colour = State_name),
size = 4) + stat_smooth(method = "lm", linetype = "dashed",
colour = "black", size = 2)
ggplot(data = city_df, aes(x = Temp, y = Ppt)) + geom_point(aes(colour = State_name),
size = 4) + stat_smooth(method = "lm", linetype = "dashed",
colour = "black", size = 2) + xlab("Temperature") + ylab("Precipitation") +
ggtitle("Temp vs. Precip. by State")
Colour or color (ggplot recognizes both): points, lines, text, borders Fill: Anything with area
Points can by filled if you set the point to be empty using pch values 21-25
Point styles are changed using pch, see shapes here (http://www.sthda.com/english/wiki/ggplot2-point-shapes)
Use fill to fill empty cirlce and colour for the outline
ggplot(data = city_df, aes(x = Temp, y = Ppt)) + geom_point(aes(fill = State_name),
size = 4, pch = 21, colour = "black") + stat_smooth(method = "lm",
linetype = "dashed", colour = "black", size = 2) + xlab("Temperature") +
ylab("Precipitation") + ggtitle("Temp vs. Precip. by State")
There are hundreds of options but they are pretty intuitive and if you have an issue, someone else has as well. Google is your best friend.
We are just going go through a couple scales.
colour vs fill
color selection is really important in figures, Jahner will cover this next week - scale_fill_manual()
- scale_colour_manual()
- scale_fill_npg() (I personally like this color palette for up to 10 discrete variables)
commands include:
- name: Legend title
- values: color codes (accepts hexcodes or colorbrewer label color)
shape
- scale_shape_manual()
commands include:
- name: Legend title
- values: pch codes
axis breaks and limits, discrete/continuous variables
- scale_x_continuous()
- scale_y_continuous()
commands include:
- name: axis title
- breaks: axis tick breaks
- label: axis break labels
- limits: axis limits
- positon: axis position
- expand: axis adjustment of margins
This isn’t a great looking figure but just showing what you can do with scale
ggplot(data = city_df, aes(x = Temp, y = Ppt)) + geom_point(aes(fill = State_name,
shape = State_name), size = 4, colour = "black") + stat_smooth(method = "lm",
linetype = "dashed", colour = "black", size = 2) + ggtitle("Temp vs. Precip. by State") +
scale_fill_manual(name = "State:", values = c("red", "blue",
"yellow", "green", "grey70")) + scale_shape_manual(name = "State:",
values = c(21, 22, 23, 24, 25)) + scale_x_continuous(name = "Temperature",
limits = c(5, 25), breaks = c(5, 10, 15, 20, 25), position = "top") +
scale_y_continuous(name = "Precipitation", limits = c(0,
1500), breaks = c(0, 250, 500, 1000, 1250, 1500))
This is how you get even more customizable with your figure layout. theme() has many many options and even some basic premade layouts. Most of the options can change the position, size, font, face, color, etc. of about anything to do with the basic layout of the figure
Standard themes
https://ggplot2.tidyverse.org/reference/ggtheme.html
just two are:
- theme_bw()
- theme_classic()
theme_bw()
ggplot(data = city_df, aes(x = Temp, y = Ppt)) + geom_point(aes(fill = State_name,
shape = State_name), size = 4, colour = "black") + stat_smooth(method = "lm",
linetype = "dashed", colour = "black", size = 2) + ggtitle("Temp vs. Precip. by State") +
scale_fill_manual(name = "State:", values = c("red", "blue",
"yellow", "green", "grey70")) + scale_shape_manual(name = "State:",
values = c(21, 22, 23, 24, 25)) + scale_x_continuous(name = "Temperature",
limits = c(5, 25), breaks = c(5, 10, 15, 20, 25)) + scale_y_continuous(name = "Precipitation",
limits = c(0, 1500), breaks = c(0, 250, 500, 1000, 1250,
1500)) + theme_bw()
theme_classic()
ggplot(data = city_df, aes(x = Temp, y = Ppt)) + geom_point(aes(fill = State_name,
shape = State_name), size = 4, colour = "black") + stat_smooth(method = "lm",
linetype = "dashed", colour = "black", size = 2) + ggtitle("Temp vs. Precip. by State") +
scale_fill_manual(name = "State:", values = c("red", "blue",
"yellow", "green", "grey70")) + scale_shape_manual(name = "State:",
values = c(21, 22, 23, 24, 25)) + scale_x_continuous(name = "Temperature",
limits = c(5, 25), breaks = c(5, 10, 15, 20, 25)) + scale_y_continuous(name = "Precipitation",
limits = c(0, 1500), breaks = c(0, 250, 500, 1000, 1250,
1500)) + theme_classic()
There are two basic need-to-know commands in theme:
- element_blank(): removes that element
- element_text(): edits text
Within element_text(), there are many things that can be adjusted that are the same throughout theme and other elements of ggplot:
**Elements within theme that are commonly adjusted with *element_text():**
- axis.text
- axis.title
- axis.title.x
- axis.title.y
- axis.ticks
- plot.title
- legend.title
- legend.text
Other common adjustments:
- panel.grid
- panel.grid.major
- panel.grid.minor
- panel.border
- panel.spacing
- legend.position
Again, anything you want to know someone else has asked online. Google is your best friend
here is a theme I commonly use and just adjust as need be
I suggest playing with the theme and just seeing what happens
ggplot(data = city_df, aes(x = Temp, y = Ppt)) + geom_point(aes(fill = State_name,
shape = State_name), size = 4, colour = "black") + stat_smooth(method = "lm",
linetype = "dashed", colour = "black", size = 2) + scale_fill_npg(name = "State:") +
scale_shape_manual(name = "State:", values = c(21, 22, 23,
24, 25)) + scale_x_continuous(name = "Temperature") +
scale_y_continuous(name = "Precipitation", limits = c(0,
1500), breaks = c(0, 250, 500, 750, 1000, 1250, 1500)) +
theme_bw() + theme(legend.position = "bottom", plot.title = element_text(size = 20,
colour = "black", face = "bold"), axis.text = element_text(size = 13),
axis.title = element_text(size = 16, colour = "black", face = "bold"),
panel.border = element_rect(size = 1.5, colour = "black"),
legend.title = element_text(size = 16, colour = "black",
face = "bold", vjust = 1), legend.text = element_text(size = 13),
panel.grid.major = element_blank(), panel.grid.minor = element_blank())
the easiest way to combine figures is with library(patchwork)
Another great function is ggarrange() in the package ggpubr, more customizable = harder to use.
first we need to make and assign figures to patch together
worst_plot <- ggplot(data = city_sum_df, aes(x = State, y = mean,
fill = Clim)) + geom_bar(stat = "identity", position = "dodge") +
ggtitle("Worst") + theme(title = element_text(size = 20,
colour = "black", face = "bold"))
worst_plot
very_bad_plot <- ggplot(data = city_long_df, aes(x = State, y = value,
fill = State)) + facet_wrap(~Clim, nrow = 1, scales = "free") +
stat_summary(fun = mean, geom = "bar") + stat_summary(fun.data = mean_cl_boot,
geom = "errorbar", color = "black") + scale_fill_npg() +
ggtitle("Very bad") + theme(legend.position = "none", title = element_text(size = 20,
colour = "black", face = "bold"))
very_bad_plot
bad_plot <- ggplot(data = city_long_df, aes(x = State, y = value,
fill = State)) + facet_wrap(~Clim, nrow = 1, scales = "free") +
geom_boxplot() + scale_fill_npg() + ggtitle("eh, bad") +
theme(legend.position = "none", title = element_text(size = 20,
colour = "black", face = "bold"))
bad_plot
okay_plot <- ggplot(data = city_long_df, aes(x = State, y = value,
fill = State)) + facet_wrap(~Clim, nrow = 1, scales = "free") +
geom_sina(size = 4, pch = 21) + stat_summary(fun.data = mean_cl_boot,
geom = "errorbar", color = "black") + stat_summary(fun = mean,
geom = "point", size = 5, colour = "black") + ggtitle("Okay") +
theme(legend.position = "none", title = element_text(size = 20,
colour = "black", face = "bold"))
okay_plot
worst_plot + very_bad_plot + bad_plot + okay_plot + plot_layout(ncol = 2,
nrow = 2)
better_plot <- ggplot(data = city_long_df, aes(x = State, y = value,
fill = State)) + facet_wrap(~factor(Clim, labels = c("Precipitation",
"Temperature")), nrow = 1, scales = "free") + geom_sina(size = 4,
pch = 21) + stat_summary(fun.data = mean_cl_boot, geom = "errorbar",
color = "black", width = 0.3, size = 1.4) + stat_summary(fun = mean,
geom = "point", size = 7, colour = "black", pch = 22, fill = "white") +
scale_fill_npg() + ggtitle("Better") + theme_bw() + theme(legend.position = "None",
plot.title = element_text(size = 26, colour = "black", face = "bold"),
axis.text = element_text(size = 18), axis.title = element_text(size = 22,
colour = "black", face = "bold"), panel.border = element_rect(size = 1.5,
colour = "black"), legend.title = element_text(size = 22,
colour = "black", face = "bold", vjust = 1), legend.text = element_text(size = 18),
panel.grid.major = element_blank(), panel.grid.minor = element_blank(),
strip.text.x = element_text(size = 22, face = "bold"), strip.background = element_rect(size = 1.5,
colour = "#333333", fill = "#CCCCCC"))
better_plot
okay_plot + better_plot
#### TEMP ####
temp_df <- city_long_df[which(city_long_df$Clim == "Temp"), ]
temp_plot <- ggplot(data = temp_df, aes(x = State, y = value,
fill = State)) + geom_sina(size = 6, pch = 21, colour = "black") +
stat_summary(fun.data = mean_cl_boot, geom = "errorbar",
color = "black", width = 0.3, size = 1.4) + stat_summary(fun = mean,
geom = "point", size = 9, colour = "black", pch = 22, fill = "white") +
scale_fill_npg() + ylab("Temperature") + theme_bw() + theme(legend.position = "None",
plot.title = element_text(size = 26, colour = "black", face = "bold"),
axis.text = element_text(size = 18), axis.title = element_text(size = 22,
colour = "black", face = "bold"), axis.title.x = element_blank(),
panel.border = element_rect(size = 1.5, colour = "black"),
legend.title = element_text(size = 22, colour = "black",
face = "bold", vjust = 1), legend.text = element_text(size = 18),
panel.grid.major = element_blank(), panel.grid.minor = element_blank())
#### PPT ####
ppt_df <- city_long_df[which(city_long_df$Clim == "Ppt"), ]
ppt_plot <- ggplot(data = ppt_df, aes(x = State, y = value, fill = State)) +
geom_sina(size = 6, pch = 21, colour = "black") + stat_summary(fun.data = mean_cl_boot,
geom = "errorbar", color = "black", width = 0.3, size = 1.4) +
stat_summary(fun = mean, geom = "point", size = 9, colour = "black",
pch = 22, fill = "white") + scale_fill_npg() + scale_y_continuous(name = "Precipitation",
position = "right", breaks = c(250, 500, 750, 1000, 12)) +
theme_bw() + theme(legend.position = "None", plot.title = element_text(size = 26,
colour = "black", face = "bold"), axis.text = element_text(size = 18),
axis.title = element_text(size = 22, colour = "black", face = "bold"),
axis.title.x = element_blank(), panel.border = element_rect(size = 1.5,
colour = "black"), legend.title = element_text(size = 22,
colour = "black", face = "bold", vjust = 1), legend.text = element_text(size = 18),
panel.grid.major = element_blank(), panel.grid.minor = element_blank())
# combine
bestest_plot <- temp_plot + ppt_plot
bestest_plot
make good choices!
worst_plot + bestest_plot